AITopics | statistical estimator

Causality as the Statistical Conscience of Artificial Intelligence: From Pearl's Ladder to Trustworthy Machines

arXiv.org Machine LearningMay-26-2026

Modern Artificial Intelligence achieves remarkable predictive power by optimizing statistical risk functionals over vast corpora. Yet a gap separates this from genuine intelligence: the inability to distinguish correlation from causation. This paper argues that causal inference (identifying mechanisms invariant under intervention) is AI's indispensable statistical conscience. Without causal grounding, AI systems are correlation machines: powerful in familiar domains, brittle under distribution shift, and biased in high-stakes settings. Three contributions develop this argument. First, a Statistical Necessity Theorem for Causal Generalization: any algorithm achieving out-of-distribution generalization must encode causal structure, formalizing the distinction between prediction P(Y|X) and intelligence P(Y|do(X)). Second, a unified framework connects Pearl's do-calculus, the Potential Outcomes framework, Double Machine Learning, and Invariant Risk Minimization as a family of Causal Statistical Estimators, each identifying interventional distributions under different assumptions. Third, three AI failure modes (hallucination in large language models, reward hacking in reinforcement learning from human feedback, and degradation under distribution shift) are manifestations of causal blindness, each admitting a principled statistical remedy. Trustworthy AI is, at its core, a problem of causal statistics. The statistical community is not merely equipped to solve it -- it is the only community with the foundational tools to do so rigorously.

large language model, machine learning, reinforcement learning, (21 more...)

arXiv.org Machine Learning

2605.24076

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings

Neural Information Processing SystemsSep-30-2025, 10:38:40 GMT

State of the art statistical estimators for high-dimensional problems take the form of regularized, and hence non-smooth, convex programs. A key facet of thesestatistical estimation problems is that these are typically not strongly convex under a high-dimensional sampling regime when the Hessian matrix becomes rank-deficient. Under vanilla convexity however, proximal optimization methods attain only a sublinear rate. In this paper, we investigate a novel variant of strong convexity, which we call Constant Nullspace Strong Convexity (CNSC), where we require that the objective function be strongly convex only over a constant subspace. As we show, the CNSC condition is naturally satisfied by high-dimensional statistical estimators. We then analyze the behavior of proximal methods under this CNSC condition: we show global linear convergence of Proximal Gradient and local quadratic convergence of Proximal Newton Method, when the regularization function comprising the statistical estimator is decomposable. We corroborate our theory via numerical experiments, and show a qualitative difference in the convergence rates of the proximal algorithms when the loss function does satisfy the CNSC condition.

constant nullspace strong convexity, high-dimensional, strong convexity and fast convergence, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)

Add feedback

Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings

Neural Information Processing SystemsJan-18-2025, 11:48:18 GMT

State of the art statistical estimators for high-dimensional problems take the form of regularized, and hence non-smooth, convex programs. A key facet of thesestatistical estimation problems is that these are typically not strongly convex under a high-dimensional sampling regime when the Hessian matrix becomes rank-deficient. Under vanilla convexity however, proximal optimization methods attain only a sublinear rate. In this paper, we investigate a novel variant of strong convexity, which we call Constant Nullspace Strong Convexity (CNSC), where we require that the objective function be strongly convex only over a constant subspace. As we show, the CNSC condition is naturally satisfied by high-dimensional statistical estimators.

constant nullspace strong convexity, statistical estimator, strong convexity and fast convergence, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.43)

Add feedback

Synthetic Data: Can We Trust Statistical Estimators?

Decruyenaere, Alexander, Dehaene, Heidelinde, Rabaey, Paloma, Polet, Christiaan, Decruyenaere, Johan, Vansteelandt, Stijn, Demeester, Thomas

arXiv.org Machine LearningDec-12-2023

The increasing interest in data sharing makes synthetic data appealing. However, the analysis of synthetic data raises a unique set of methodological challenges. In this work, we highlight the importance of inferential utility and provide empirical evidence against naive inference from synthetic data (that handles these as if they were really observed). We argue that the rate of false-positive findings (type 1 error) will be unacceptably high, even when the estimates are unbiased. One of the reasons is the underestimation of the true standard error, which may even progressively increase with larger sample sizes due to slower convergence. This is especially problematic for deep generative models. Before publishing synthetic data, it is essential to develop statistical inference tools for such data.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

2312.07837

Country:

Oceania > Australia > New South Wales (0.04)
Europe > Belgium > Flanders (0.04)

Genre:

Research Report > Experimental Study (0.70)
Research Report > New Finding (0.48)

Industry: Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
(2 more...)

Add feedback

Propensity score models are better when post-calibrated

Gutman, Rom, Karavani, Ehud, Shimoni, Yishai

arXiv.org Machine LearningNov-2-2022

The propensity score is defined as the conditional probability of being assigned to a treatment (exposure) given one's observed confounding variables. It is very commonly used in methods for estimating causal effects from observational data, such as inverse probability weighting [1], propensity matching [2, 3], propensity stratification [4], as well as many doubly-robust methods [5, 6, 7, 8] Rosenbaum and Rubin [2] set up theoretical guaranties ensuring that adjusting for the propensity score, instead of the covariates themselves, is sufficient in order to achieve the conditional exchangeability needed to estimate a causal effect. However, while these theoretical guarantees require the true conditional probabilities, when applied in practice, not every model that inputs data and outputs a number between zero and one, correctly estimates true probabilities. The scores might not reliably represent true probabilities. A prediction model that accurately outputs probabilities is referred to as calibrated (note this is unrelated to a previous notion of "propensity score calibration" from [9]). Calibration can be empirically evaluated with calibration curve (reliability curves), comparing the predicted scores with their corresponding rate of labels [10].

artificial intelligence, machine learning, propensity score, (15 more...)

arXiv.org Machine Learning

2211.01221

Country:

North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre:

Research Report > New Finding (0.50)
Research Report > Experimental Study (0.33)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

DoWhy: An End-to-End Library for Causal Inference

Sharma, Amit, Kiciman, Emre

arXiv.org Artificial IntelligenceNov-9-2020

In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python library that is built with causal assumptions as its first-class citizens, based on the formal framework of causal graphs to specify and test causal assumptions. DoWhy presents an API for the four steps common to any causal analysis---1) modeling the data using a causal graph and structural assumptions, 2) identifying whether the desired effect is estimable under the causal model, 3) estimating the effect using statistical estimators, and finally 4) refuting the obtained estimate through robustness checks and sensitivity analyses. In particular, DoWhy implements a number of robustness checks including placebo tests, bootstrap tests, and tests for unoberved confounding. DoWhy is an extensible library that supports interoperability with other implementations, such as EconML and CausalML for the the estimation step. The library is available at https://github.com/microsoft/dowhy

assumption, dowhy, library, (14 more...)

arXiv.org Artificial Intelligence

2011.04216

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Genre: Research Report (0.66)

Industry: Health & Medicine (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Software (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)

Add feedback

Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings

Yen, Ian En-Hsu, Hsieh, Cho-Jui, Ravikumar, Pradeep K., Dhillon, Inderjit S.

Neural Information Processing SystemsFeb-14-2020, 07:12:06 GMT

State of the art statistical estimators for high-dimensional problems take the form of regularized, and hence non-smooth, convex programs. A key facet of thesestatistical estimation problems is that these are typically not strongly convex under a high-dimensional sampling regime when the Hessian matrix becomes rank-deficient. Under vanilla convexity however, proximal optimization methods attain only a sublinear rate. In this paper, we investigate a novel variant of strong convexity, which we call Constant Nullspace Strong Convexity (CNSC), where we require that the objective function be strongly convex only over a constant subspace. As we show, the CNSC condition is naturally satisfied by high-dimensional statistical estimators.

constant nullspace strong convexity, statistical estimator, strong convexity and fast convergence, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.44)

Add feedback

Convergence Rates for Differentially Private Statistical Estimation

Chaudhuri, Kamalika, Hsu, Daniel

arXiv.org Machine LearningJun-27-2012

Differential privacy is a cryptographically-motivated definition of privacy which has gained significant attention over the past few years. Differentially private solutions enforce privacy by adding random noise to a function computed over the data, and the challenge in designing such algorithms is to control the added noise in order to optimize the privacy-accuracy-sample size tradeoff. This work studies differentially-private statistical estimation, and shows upper and lower bounds on the convergence rates of differentially private approximations to statistical estimators. Our results reveal a formal connection between differential privacy and the notion of Gross Error Sensitivity (GES) in robust statistics, by showing that the convergence rate of any differentially private approximation to an estimator that is accurate over a large class of distributions has to grow with the GES of the estimator. We then provide an upper bound on the convergence rate of a differentially private approximation to an estimator with bounded range and bounded GES. We show that the bounded range condition is necessary if we wish to ensure a strict form of differential privacy.

artificial intelligence, estimator, privacy, (12 more...)

arXiv.org Machine Learning

1206.6395

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Filters

Collaborating Authors

statistical estimator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Causality as the Statistical Conscience of Artificial Intelligence: From Pearl's Ladder to Trustworthy Machines

Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings

Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings

Synthetic Data: Can We Trust Statistical Estimators?

Propensity score models are better when post-calibrated

DoWhy: An End-to-End Library for Causal Inference

Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings

Convergence Rates for Differentially Private Statistical Estimation